---
title: "RagasEvaluator"
id: ragasevaluator
slug: "/ragasevaluator"
description: "This component evaluates Haystack pipelines using LLM-based metrics. It supports metrics like context relevance, factual accuracy, response relevance, and more."
---

# RagasEvaluator

This component evaluates Haystack pipelines using LLM-based metrics. It supports metrics like context relevance, factual accuracy, response relevance, and more.

<div className="key-value-table">

|  |  |
| --- | --- |
| **Most common position in a pipeline** | On its own or in an evaluation pipeline. To be used after a separate pipeline has generated the inputs for the Evaluator. |
| **Mandatory init variables** | `metric`: A Ragas metric to use for evaluation |
| **Mandatory run variables** | `inputs`: A keyword arguments dictionary containing the expected inputs. The expected inputs will change based on the metric you are evaluating. See below for more details. |
| **Output variables** | `results`: A nested list of metric results. There can be one or more results, depending on the metric. Each result is a dictionary containing:  <br /> <br />- `name` - The name of the metric.  <br /> <br />- `score` - The score of the metric. |
| **API reference** | [Ragas](/reference/integrations-ragas) |
| **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/ragas |

</div>

Ragas is an evaluation framework that provides a number of LLM-based evaluation metrics. You can use the `RagasEvaluator` component to evaluate a Haystack pipeline, such as a retrieval-augmented generative pipeline, against one of the metrics provided by Ragas.

## Supported Metrics

Ragas supports a number of metrics, which we expose through the Ragas metric enumeration. Below is the list of metrics supported by the `RagasEvaluator` in Haystack with the expected `metric_params` while initializing the evaluator. Many metrics use OpenAI models and require an environment variable `OPENAI_API_KEY` to be set. For a complete guide on these metrics, visit the [Ragas documentation](https://docs.ragas.io/).

<div className="key-value-table">

|  |  |
| --- | --- |
| **Most common position in a pipeline** | On its own or in an evaluation pipeline. To be used after a separate pipeline has generated the inputs for the Evaluator. |
| **Mandatory init variables** | `metric`: A Ragas metric to use for evaluation |
| **Mandatory run variables** | `inputs`: A keyword arguments dictionary containing the expected inputs. The expected inputs will change based on the metric you are evaluating. See below for more details. |
| **Output variables** | `results`: A nested list of metric results. There can be one or more results, depending on the metric. Each result is a dictionary containing:  <br /> <br />- `name` - The name of the metric.  <br /> <br />- `score` - The score of the metric. |
| **API reference** | [Ragas](/reference/integrations-ragas) |
| **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/ragas |

</div>

## Parameters Overview

To initialize a `RagasEvaluator`, you need to provide the following parameters :

- `metric`: A `RagasMetric`.
- `metric_params`: Optionally, if the metric calls for any additional parameters, you should provide them here.

## Usage

To use the `RagasEvaluator`, you first need to install the integration:

```bash
pip install ragas-haystack
```

To use the `RagasEvaluator` you need to follow these steps:

1. Initialize the `RagasEvaluator` while providing the correct `metric_params` for the metric you are using.
2. Run the `RagasEvaluator`, either on its own or in a pipeline, by providing the expected input for the metric you are using.

### Examples

#### Evaluate Context Relevance

To create a context-relevance evaluation pipeline:

```python
from haystack import Pipeline
from haystack_integrations.components.evaluators.ragas import RagasEvaluator, RagasMetric

pipeline = Pipeline()
evaluator = RagasEvaluator(
    metric=RagasMetric.ANSWER_RELEVANCY,
)
pipeline.add_component("evaluator", evaluator)
```

To run the evaluation pipeline, you should have the _expected inputs_ for the metric ready at hand. This metric expects a list of `questions` and `contexts`, which should come from the results of the pipeline you want to evaluate.

```python
results = pipeline.run({"evaluator": {"questions": ["When was the Rhodes Statue built?", "Where is the Pyramid of Giza?"],
                                                "contexts": [["Context for question 1"], ["Context for question 2"]]}})
```

#### Evaluate Context Relevance and Aspect Critique

To create a pipeline that evaluates context relevance and aspect critique:

```python
from haystack import Pipeline
from haystack_integrations.components.evaluators.ragas import RagasEvaluator, RagasMetric

pipeline = Pipeline()
evaluator_context = RagasEvaluator(
    metric=RagasMetric.CONTEXT_PRECISION,
)
evaluator_aspect = RagasEvaluator(
    metric=RagasMetric.ASPECT_CRITIQUE,
    metric_params={"name": "custom", "definition": "Is this answer problematic for children?", "strictness": 3},
)
pipeline.add_component("evaluator_context", evaluator_context)
pipeline.add_component("evaluator_aspect", evaluator_aspect)
```

To run the evaluation pipeline, you should have the _expected inputs_ for the metrics ready at hand. These metrics expect a list of `questions`, `contexts`, `responses`, and `ground_truths`. These should come from the results of the pipeline you want to evaluate.

```python
QUESTIONS = ["Which is the most popular global sport?", "Who created the Python language?"]
CONTEXTS = [["The popularity of sports can be measured in various ways, including TV viewership, social media presence, number of participants, and economic impact. Football is undoubtedly the world's most popular sport with major events like the FIFA World Cup and sports personalities like Ronaldo and Messi, drawing a followership of more than 4 billion people."],
                 ["Python, created by Guido van Rossum in the late 1980s, is a high-level general-purpose programming language. Its design philosophy emphasizes code readability, and its language constructs aim to help programmers write clear, logical code for both small and large-scale software projects."]]
RESPONSES = ["Football is the most popular sport with around 4 billion followers worldwide", "Python language was created by Guido van Rossum."]
GROUND_TRUTHS = ["Football is the most popular sport", "Python language was created by Guido van Rossum."]
results = pipeline.run({
        "evaluator_context": {"questions": QUESTIONS, "contexts": CONTEXTS, "ground_truths": GROUND_TRUTHS},
        "evaluator_aspect": {"questions": QUESTIONS, "contexts": CONTEXTS, "responses": RESPONSES},
})
```

## Additional References

🧑‍🍳 Cookbook: [Evaluate a RAG pipeline using Ragas integration](https://haystack.deepset.ai/cookbook/rag_eval_ragas)
